Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors
نویسندگان
چکیده
Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-ofthe-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.
منابع مشابه
Efficient Local Feature Encoding for Human Action Recognition with Approximate Sparse Coding
Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate Sparse Coding (ASC). ASC computes the...
متن کاملDeep Tensor Encodings
Learning an encoding of feature vectors in terms of an over-complete dictionary or a information geometric (Fisher vectors) construct is wide-spread in statistical signal processing and computer vision. In content based information retrieval using deep-learning classiers, such encodings are learnt on the aened last layer, without adherence to the multi-linear structure of the underlying feat...
متن کاملLocal Patch Vectors Encoded by Fisher Vectors for Image Classification
The objective of this work is image classification, whose purpose is to group images into corresponding semantic categories. Four contributions are made as follows: (i) For computational simplicity and efficiency, we directly adopt raw image patch vectors as local descriptors encoded by Fisher vector (FV) subsequently; (ii) For obtaining representative local features within the FV encoding fram...
متن کاملImage classification using spatial pyramid robust sparse coding
Recently, the sparse coding based codebook learning and local feature encoding have been widely used for image classification. The sparse coding model actually assumes the reconstruction error follows Gaussian or Laplacian distribution, which may not be accurate enough. Besides, the ignorance of spatial information during local feature encoding process also hinders the final image classificatio...
متن کاملHierarchical Coding Vectors for Scene Level Land-Use Classification
Land-use classification from remote sensing images has become an important but challenging task. This paper proposes Hierarchical Coding Vectors (HCV), a novel representation based on hierarchically coding structures, for scene level land-use classification. We stack multiple Bag of Visual Words (BOVW) coding layers and one Fisher coding layer to develop the hierarchical feature learning struct...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014